One-sample t-test (Student’s t-test)#

The one-sample t-test answers:

Is the population mean plausibly equal to a reference value μ0?

Use it when you have one sample of numeric measurements, the population standard deviation is unknown, and you care about the mean.

Learning goals#

  • Know when the one-sample t-test is the right tool (and when it is not).

  • Write the null and alternative hypotheses (two-sided / one-sided).

  • Understand the t-statistic as a signal-to-noise ratio.

  • Interpret the p-value correctly (what it is and what it is not).

  • Implement the test with NumPy only (including an estimated p-value).

  • Build intuition with Plotly visuals (p-value area, df effects, sampling behavior, power).

Prerequisites#

  • Basic descriptive stats: mean, (sample) standard deviation.

  • The idea of a sampling distribution (statistics vary from sample to sample).

  • Optional: the z-test (mean test when σ is known).

import math
import platform

import numpy as np

import plotly.graph_objects as go
import os
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(7)

print("Python", platform.python_version())
print("NumPy ", np.__version__)
try:
    import plotly

    print("Plotly", plotly.__version__)
except Exception:
    pass
try:
    import scipy

    print("SciPy ", scipy.__version__)
except Exception:
    pass
Python 3.12.9
NumPy  1.26.2
Plotly 6.5.2
SciPy  1.15.0

Intuition: signal-to-noise#

You observe a sample x1, x2, …, xn and want to compare its mean to a reference mean μ0.

The core question is not “is different from μ0?” (it almost always is, a little), but:

Is the difference large relative to the uncertainty in the mean estimate?

The t-statistic is exactly that ratio:

t = (x̄ − μ0) / SE
SE = s / √n
df = n − 1
  • numerator: signal ( μ0)

  • denominator: noise (estimated standard error of the mean)

Because we estimate the standard deviation with s (instead of knowing σ exactly), the statistic follows a Student t distribution under the null hypothesis.

When is it used?#

Typical use cases:

  • Quality control: Is the average fill volume equal to the labeled amount?

  • Performance/SLA: Is the mean latency higher than a target threshold?

  • Science/medicine: Is the mean change from baseline different from 0?

You are comparing a single sample mean to a fixed reference value.

If you are comparing two groups (two samples), you want a two-sample t-test (or Welch’s t-test).

Assumptions (and what happens if they fail)#

The classical one-sample t-test relies on:

  1. Independence: observations are independent (no time dependence, no clustering).

  2. Approximately normal data (or large n):

    • If the population is normal, the test is exact.

    • For larger n, the Central Limit Theorem makes the mean closer to normal.

  3. No extreme outliers: outliers inflate s and can dominate the mean.

If normality is questionable and n is small, consider robust / nonparametric alternatives (e.g., sign test, Wilcoxon signed-rank test) or bootstrap confidence intervals.

Hypotheses#

Choose an alternative that matches the question before looking at the data:

  • Two-sided: H0: μ = μ0 vs H1: μ μ0

  • Greater (right-tailed): H0: μ = μ0 vs H1: μ > μ0

  • Less (left-tailed): H0: μ = μ0 vs H1: μ < μ0

Test procedure (recipe)#

  1. Choose μ0, a significance level α (often 0.05), and the alternative.

  2. Compute , s, and SE = s/√n.

  3. Compute t = (x̄ μ0)/SE with df = n 1.

  4. Compute the p-value as a tail probability under T ~ t(df).

  5. Reject H0 if p α.

  6. Report the estimate (mean), a CI for μ, and an effect size.

Interpreting the result (what it means)#

The p-value is:

The probability, assuming the null hypothesis is true, of seeing a t-statistic at least as extreme as the one you observed.

So:

  • Small p-value → your data would be rare under H0 → evidence against μ = μ0.

  • Large p-value → your data is not unusual under H0 → you fail to reject H0.

Important: a large p-value does not prove μ = μ0. It usually means “not enough evidence with this sample size / noise level”.

A helpful companion is the confidence interval (CI) for μ:

  • A 95% two-sided CI is the set of means that would not be rejected by a 5% two-sided test.

  • If μ0 is outside the CI, you reject at that α.

Also separate statistical significance (p-value) from practical significance (effect size like Cohen’s d).

# Example data: fill volumes (ml). The label says 250ml.
mu0 = 250.0

# Synthetic sample: true mean slightly above 250, unknown variance
x = rng.normal(loc=252.0, scale=4.0, size=20)

print("n=", x.size)
print("mean=", float(x.mean()))
print("std=", float(x.std(ddof=1)))
x[:10]
n= 20
mean= 250.7343483866156
std= 3.124737477155004
array([252.0049, 253.195 , 250.9034, 248.4376, 250.1813, 248.0334,
       252.2406, 257.3609, 250.0312, 249.5181])
# Always visualize the sample (outliers + skewness matter)
fig = go.Figure()
fig.add_trace(
    go.Violin(
        y=x,
        box_visible=True,
        meanline_visible=True,
        points="all",
        jitter=0.2,
        name="sample",
    )
)
fig.add_shape(
    type="line",
    x0=0,
    x1=1,
    xref="paper",
    y0=mu0,
    y1=mu0,
    line=dict(color="rgba(214, 39, 40, 1)", dash="dash", width=2),
)
fig.add_annotation(
    x=0.98,
    y=mu0,
    xref="paper",
    text="μ0",
    showarrow=False,
    yshift=10,
    font=dict(color="rgba(214, 39, 40, 1)"),
)
fig.update_layout(title="Sample vs reference μ0", yaxis_title="measurement (ml)")
fig.show()

NumPy-only implementation (from scratch)#

We can compute the t-statistic exactly with NumPy.

The only tricky part (if you restrict yourself to NumPy) is computing the t-distribution tail probability (the p-value) and the t critical value for the CI.

To keep everything NumPy-only and still match the definition (“tail area under the null distribution”), we estimate these probabilities via Monte Carlo:

  • Draw many samples from T ~ t(df) using np.random.standard_t.

  • Approximate probabilities as empirical frequencies.

This is not how you’d do production statistics (you’d use a specialized library for accurate CDF/PPF), but it’s a great way to understand what the p-value is.

def student_t_pdf(x: np.ndarray, df: int) -> np.ndarray:
    """Student t PDF computed from the definition (NumPy + standard library only)."""
    x = np.asarray(x, dtype=float)
    df = int(df)
    if df <= 0:
        raise ValueError("df must be a positive integer")

    log_norm = math.lgamma((df + 1) / 2) - (
        0.5 * math.log(df * math.pi) + math.lgamma(df / 2)
    )
    return np.exp(log_norm) * (1 + (x**2) / df) ** (-(df + 1) / 2)


def normal_pdf(x: np.ndarray) -> np.ndarray:
    x = np.asarray(x, dtype=float)
    return (1 / np.sqrt(2 * np.pi)) * np.exp(-0.5 * x**2)


def ttest_1samp_numpy(
    x: np.ndarray,
    mu0: float,
    *,
    alternative: str = "two-sided",
    alpha: float = 0.05,
    n_mc: int = 300_000,
    seed: int = 123,
) -> dict:
    """One-sample t-test with a NumPy-only Monte Carlo p-value.

    Parameters
    - x: sample (1D array)
    - mu0: null mean
    - alternative: 'two-sided', 'greater', 'less'
    - alpha: significance level for CI and decision
    - n_mc: Monte Carlo sample size for approximating p-value and t critical values
    """
    x = np.asarray(x, dtype=float)
    x = x[~np.isnan(x)]

    n = int(x.size)
    if n < 2:
        raise ValueError("Need at least 2 non-NaN observations.")

    df = n - 1
    mean = float(x.mean())
    s = float(x.std(ddof=1))
    se = s / np.sqrt(n) if s > 0 else 0.0

    if se == 0.0:
        t_stat = float(np.inf * np.sign(mean - mu0) if mean != mu0 else 0.0)
        p_value = float(0.0 if mean != mu0 else 1.0)
        t_crit = float("nan")
        ci = (mean, mean)
        cohen_d = float(np.inf * np.sign(mean - mu0) if mean != mu0 else 0.0)
        decision = "reject H0" if p_value <= alpha else "fail to reject H0"
        return {
            "n": n,
            "df": df,
            "mu0": float(mu0),
            "mean": mean,
            "std": s,
            "se": se,
            "t_stat": t_stat,
            "p_value": p_value,
            "alpha": float(alpha),
            "alternative": alternative,
            "t_crit": t_crit,
            "ci": (float(ci[0]), float(ci[1])),
            "cohen_d": cohen_d,
            "decision": decision,
            "mc_samples": int(n_mc),
            "mc_seed": int(seed),
        }

    t_stat = float((mean - mu0) / se)
    cohen_d = float((mean - mu0) / s)

    rng_local = np.random.default_rng(seed)
    t_null = rng_local.standard_t(df, size=int(n_mc))

    if alternative == "two-sided":
        p_value = float(np.mean(np.abs(t_null) >= abs(t_stat)))
        t_crit = float(np.quantile(t_null, 1 - alpha / 2))
        ci = (mean - t_crit * se, mean + t_crit * se)
    elif alternative == "greater":
        p_value = float(np.mean(t_null >= t_stat))
        t_crit = float(np.quantile(t_null, 1 - alpha))
        ci = (mean - t_crit * se, np.inf)
    elif alternative == "less":
        p_value = float(np.mean(t_null <= t_stat))
        t_crit = float(np.quantile(t_null, 1 - alpha))
        ci = (-np.inf, mean + t_crit * se)
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    decision = "reject H0" if p_value <= alpha else "fail to reject H0"

    return {
        "n": n,
        "df": df,
        "mu0": float(mu0),
        "mean": mean,
        "std": s,
        "se": float(se),
        "t_stat": t_stat,
        "p_value": p_value,
        "alpha": float(alpha),
        "alternative": alternative,
        "t_crit": t_crit,
        "ci": (float(ci[0]), float(ci[1])),
        "cohen_d": cohen_d,
        "decision": decision,
        "mc_samples": int(n_mc),
        "mc_seed": int(seed),
    }
res = ttest_1samp_numpy(x, mu0, alternative="two-sided", alpha=0.05, n_mc=500_000, seed=42)

print(f"n={res['n']}, mean={res['mean']:.3f}, std={res['std']:.3f}, SE={res['se']:.3f}")
print(f"t={res['t_stat']:.3f} (df={res['df']}), p≈{res['p_value']:.4f}, alpha={res['alpha']}")
print(f"95% CI for μ: [{res['ci'][0]:.3f}, {res['ci'][1]:.3f}]")
print(f"Cohen's d (one-sample): {res['cohen_d']:.3f}")
print("Decision:", res["decision"])
n=20, mean=250.734, std=3.125, SE=0.699
t=1.051 (df=19), p≈0.3066, alpha=0.05
95% CI for μ: [249.270, 252.199]
Cohen's d (one-sample): 0.235
Decision: fail to reject H0
# Optional validation against SciPy (production-grade distribution functions)
try:
    from scipy.stats import ttest_1samp

    scipy_res = ttest_1samp(x, popmean=mu0, alternative="two-sided")
    print("SciPy t:", float(scipy_res.statistic))
    print("SciPy p:", float(scipy_res.pvalue))
except Exception as e:
    print("SciPy check skipped:", e)
SciPy t: 1.0510021553137605
SciPy p: 0.30644215697136534

Visual: the p-value is a tail area#

For a two-sided test, the p-value is the probability (under H0) of seeing |T| |t_obs|.

That probability corresponds to the red shaded area below.

t_obs = res["t_stat"]
df = res["df"]
tcrit = res["t_crit"]

xmax = max(6.0, abs(t_obs) + 1.0, abs(tcrit) + 1.0)
xx = np.linspace(-xmax, xmax, 3001)
yy = student_t_pdf(xx, df)

abs_t = abs(t_obs)
mask_left = xx <= -abs_t
mask_right = xx >= abs_t

fig = go.Figure()
fig.add_trace(go.Scatter(x=xx, y=yy, mode="lines", name=f"t pdf (df={df})"))
fig.add_trace(
    go.Scatter(
        x=xx[mask_left],
        y=yy[mask_left],
        mode="lines",
        line=dict(width=0),
        fill="tozeroy",
        name="p-value tail",
        showlegend=False,
        fillcolor="rgba(214, 39, 40, 0.35)",
    )
)
fig.add_trace(
    go.Scatter(
        x=xx[mask_right],
        y=yy[mask_right],
        mode="lines",
        line=dict(width=0),
        fill="tozeroy",
        showlegend=False,
        fillcolor="rgba(214, 39, 40, 0.35)",
    )
)

ymax = float(yy.max())
for xline, dash, color, width in [
    (t_obs, "dash", "rgba(214, 39, 40, 1)", 2),
    (tcrit, "dot", "rgba(0, 0, 0, 0.6)", 1),
    (-tcrit, "dot", "rgba(0, 0, 0, 0.6)", 1),
]:
    fig.add_shape(
        type="line",
        x0=xline,
        x1=xline,
        y0=0,
        y1=ymax,
        xref="x",
        yref="y",
        line=dict(color=color, dash=dash, width=width),
    )

fig.update_layout(
    title=f"Two-sided p-value as tail area (t={t_obs:.3f}, p≈{res['p_value']:.4f})",
    xaxis_title="t",
    yaxis_title="density",
)
fig.show()

Visual: degrees of freedom control tail heaviness#

For small df, the t distribution has heavier tails than the standard normal.

As df , the t distribution approaches a normal distribution.

dfs = [1, 2, 5, 10, 30, 100]
xx = np.linspace(-5, 5, 2001)

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=xx,
        y=normal_pdf(xx),
        mode="lines",
        name="Normal(0,1)",
        line=dict(color="black", dash="dash"),
    )
)

for i, df_ in enumerate(dfs):
    fig.add_trace(
        go.Scatter(
            x=xx,
            y=student_t_pdf(xx, df_),
            mode="lines",
            name=f"t(df={df_})",
            visible=(i == 0),
        )
    )

steps = []
for i, df_ in enumerate(dfs):
    visible = [True] + [False] * len(dfs)
    visible[1 + i] = True
    steps.append(
        dict(
            method="update",
            args=[{"visible": visible}, {"title": f"Student t vs Normal — df={df_}"}],
            label=str(df_),
        )
    )

fig.update_layout(
    title=f"Student t vs Normal — df={dfs[0]}",
    xaxis_title="x",
    yaxis_title="density",
    sliders=[
        dict(
            active=0,
            currentvalue={"prefix": "df: "},
            pad={"t": 30},
            steps=steps,
        )
    ],
)
fig.show()

Visual: under H0, the computed t-statistic follows a t distribution#

If the data really comes from a normal population with mean μ0, then the statistic

t = (x̄ − μ0) / (s / √n)

has a t(df=n−1) distribution.

We can check that by simulation.

rng_sim = np.random.default_rng(123)

B = 50_000
n = res["n"]
df = n - 1

mu0_sim = 0.0
x0 = rng_sim.normal(loc=mu0_sim, scale=1.0, size=(B, n))
t_stats = (x0.mean(axis=1) - mu0_sim) / (x0.std(axis=1, ddof=1) / np.sqrt(n))

xx = np.linspace(-6, 6, 2001)
yy = student_t_pdf(xx, df)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=t_stats,
        nbinsx=80,
        histnorm="probability density",
        name="Simulated t-stat",
        opacity=0.65,
    )
)
fig.add_trace(
    go.Scatter(x=xx, y=yy, mode="lines", name=f"t pdf (df={df})", line=dict(width=3))
)

fig.update_layout(
    barmode="overlay",
    title=f"t-statistic under H0 matches t(df={df})",
    xaxis_title="t",
    yaxis_title="density",
)
fig.show()

Visual: power depends on effect size and sample size#

Power is the probability of rejecting H0 when H1 is true.

If the true mean is μ = μ0 + δ, power increases when:

  • |δ| is larger (bigger effect)

  • n is larger (smaller SE)

  • α is larger (more aggressive rejection rule)

Below we estimate power by simulation for a two-sided test at α=0.05.

alpha = 0.05

# Generic setup for power: mu0=0, sigma=1 so δ is an effect size in "sigma units"
mu0_power = 0.0
sigma_power = 1.0
deltas = [0.2, 0.5, 0.8]

n_grid = np.array([5, 8, 12, 20, 30, 40, 60, 80, 100])

B_power = 12_000
B_crit = 120_000

rng_power = np.random.default_rng(202)

power = {d: [] for d in deltas}

for n in n_grid:
    df = n - 1
    t_null = rng_power.standard_t(df, size=B_crit)
    tcrit = float(np.quantile(t_null, 1 - alpha / 2))

    for d in deltas:
        x_alt = rng_power.normal(
            loc=mu0_power + d * sigma_power,
            scale=sigma_power,
            size=(B_power, n),
        )
        t_alt = (x_alt.mean(axis=1) - mu0_power) / (
            x_alt.std(axis=1, ddof=1) / np.sqrt(n)
        )
        power[d].append(float(np.mean(np.abs(t_alt) >= tcrit)))

fig = go.Figure()
for d in deltas:
    fig.add_trace(
        go.Scatter(
            x=n_grid,
            y=power[d],
            mode="lines+markers",
            name=f"δ={d}σ",
        )
    )

fig.update_layout(
    title="Estimated power vs sample size (two-sided t-test, α=0.05)",
    xaxis_title="n",
    yaxis_title="power",
    yaxis=dict(range=[0, 1]),
)
fig.show()

Pitfalls and practical notes#

  • p-value is not “probability that H0 is true”. It’s P(data as extreme | H0).

  • Failing to reject does not prove equality; it often means low power.

  • Outliers can strongly affect the mean and inflate s. Always plot the data.

  • Independence is crucial. If you have time series dependence or repeated measures, use the right model/test.

  • If you test many hypotheses, consider multiple testing corrections.

Exercises#

  1. Change alternative to 'greater' and 'less'. How does the CI change?

  2. Make n smaller (e.g., 5) and rerun. What happens to the t critical value and p-value stability?

  3. Add a single extreme outlier to x and rerun. What happens to the mean, s, and the decision?

  4. Compare the Monte Carlo p-value to SciPy while increasing n_mc. How fast does it converge?

References#

  • Standard introductory statistics texts: Student’s one-sample t-test

  • SciPy: scipy.stats.ttest_1samp